Content and Link Structure Analysis for Searching the Web

نویسندگان

  • Kemal Efe
  • Vijay Raghavan
  • Arun Lakhotia
چکیده

Automated search engines continuously discover, index, and store information about web pages. When a user issues a query, this repository is searched to find a result set of most relevant pages. An ideal search scheme must satisfy two basic requirements: high recall, and high precision. Recall measures the ability of an algorithm to find as many relevant pages as possible. Precision measures the ability of an algorithm to reject as many nonrelevant pages as possible. An ideal search algorithm should find all of the relevant pages, rank them by relevance to the user query, and present a rank-ordered result to the user. The earlier generations of search engines relied solely on keyword matching to perform the search. Unfortunately this approach didn’t work very well. Too many nonrelevant pages were returned along with relevant ones, and their rankings rarely agreed with users’ interests. Since user queries are short, usually consist of 2-3 words, the problems associated with synonyny and polysemy make it particularly difficult to evaluate which pages will be of interest to a user. The user is more likely to be interested in a page if it contains authoritative information on its subject and it is relevant to the user query. Authoritative pages are usually cited by others frequently, and the link

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Shear-Flexural Interaction in Analysis of Reduced Web Section Beams using VM Link Element

Reduced web section beams in shear-yielding moment-resistant steel frames are used for energy dissipating of earthquakes. The finite element analysis indicates that failure mode of these beams are governed by the combination of shear force and flexural moment. Therefore the analysis of frames with reduced web section beams needs consideration of shear-flexural interaction in those sections. In ...

متن کامل

Searching for web communities through site level link and content analysis

In recent years, heavy user growth in Web 2.0 applications such as blogs and online social networks has helped support studies into classifying various web communities. Researchers have made many interesting finds, e.g. identifying subgroups within large "blogospheres." However, there has been less focus starting from a small community of sites to find other sites that are potentially part of t...

متن کامل

Lexical and semantic clustering by Web links

Recent Web searching and mining tools are combining text and link analysis to improve ranking and crawling algorithms. The central assumption behind such approaches is that there is a correlation between the graph structure of the Web and the text and meaning of pages. Here I formalize and quantitatively validate two conjectures drawing connections from linkage information to lexical and semant...

متن کامل

A Survey Paper of Structure Mining Technique using Clustering and Ranking Algorithm

A survey of various link analysis and clustering algorithms such as Page Rank, Hyperlink-Induced Topic Search, Weighted Page Rank based on Visit of Links K-Means, Fuzzy K-Means. Ranking algorithms illustrated, Weighted Page Rank is more efficient than Hyperlink-induced Topic Search Whereas clustering algorithms has described Fuzzy Soft, Rough K-Means is a mixture of Rough K-Means and fuzzy soft...

متن کامل

The Content and Structure of Electronic Personal Health Records: A Systematic Review

Introduction: The electronic Personal Health Record (ePHR) improves people’s awareness and care management and leads to health promotion. One of the most important factors that contributes to the development of ePHR is identifying and understanding its content and structure. No comprehensive studies have so far been performed on the content and structure of ePHRs. Therefore, the purpose of this...

متن کامل

The Content and Structure of Electronic Personal Health Records: A Systematic Review

Introduction: The electronic Personal Health Record (ePHR) improves people’s awareness and care management and leads to health promotion. One of the most important factors that contributes to the development of ePHR is identifying and understanding its content and structure. No comprehensive studies have so far been performed on the content and structure of ePHRs. Therefore, the purpose of this...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004